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Abstract. Based on the formation of triad junctions, the proposed mechanism 
generates networks that exhibit extended rather than single power law behavior. 
Triad formation guarantees strong neighborhood clustering and community-level 
characteristics as the network size grows to infinity. The asymptotic behavior is of 
interest in the study of directed networks in which (i) the formation of links cannot 
be described according to the principle of preferential attachment; {ii) the in-degree 
distribution fits a power law for nodes with a high degree and an exponential form 
otherwise; (Hi) clustering properties emerge at multiple scales and depend on both 
the number of links that newly added nodes establish and the probability of forming 
triads; and (iv) groups of nodes form modules that feature less links to the rest of the 
nodes. 



PACS numbers: 89.75.Da, 89.75.Fb 
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1. Introduction 

Networks are systems composed of well-defined elements (nodes) that display collective 
behaviors at multiple levels of analysis. Large networks arise by the gradual addition 
of elements which attach to an existing and often evolving network component. With 
our modern access to data, the application of network techniques offers a wide set of 
mathematical tools to visualize data at the level of the data elements and the interaction 
between them. These tools allow us to characterize higher-level properties of the 
structure of a system and to identify different types of patterns in the relationships 
among elements. 

The development of models that describe the evolution of networks has been driven 
by the need to analyze large amounts of relational data across a wide range of fields. 
Well-known examples include the study of relationships we see in scientific collaborations 
[1], export goods [2], traffic [3], social ties stocks [5], and patent citations [6]- [8]. 
Trying to address the question of how particular topologies arise as networks grow, a 
large body of work has been devoted to understand the emergence of three properties: 
the distribution of links per node (degree distribution), the proportion of links grouped 
into local neighborhoods (clustering or transitivity) [9], and the division of the 
set of nodes into modules (communities) with tight interconnections within and sparser 
links across them 

In extended power law networks, the probability pk that a node with a low degree 
of connectivity (below some threshold e) connects to k other nodes fits an exponential 
form e~^^ for some positive constant A. For nodes with a high degree, the probability pk 
is proportional to the power law function k~°' for some positive constant a. Because the 
tail of the probability distribution of the degree of the nodes has no exponential bound, 
the patterns of interaction in power law networks differ in orders of magnitude, with 
a few nodes being highly connected. Mechanisms leading to power law networks have 
been overviewed in [12]. A particular class of mechanisms in which nodes with a high 
degree have a greater probability of acquiring new links (attributed to the principle of 
preferential attachment) has been proposed to explain the scaling behavior in empirical 
data [13], [H]. 

In clustered networks, the probability of finding transitive triplets is higher than 
the outcome expected through random chance. If a node connects to two other nodes, 
clustering captures the probability that these two nodes are connected, too. In a network 
with high clustering, nodes do not interact homogeneously with other nodes, but tend to 
influence each other locally (i.e., they form strong neighborhood clusters [15]). Common 
measures of clustering are based on (z) the total number of transitive triplets relative to 
the total number of possible triplets in the network, represented by a global clustering 
coefficient C |10|; or (ii) the fraction of triplets connecting the neighboring nodes of node 
i over the total number of possible triplets, represented by a local clustering coefficient 
Ci [16]. Real- world networks show clustering coefficients that are generally independent 
of the size of the network and scale with the degree of the nodes [T7] . 
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In networks with community structure, the division of the set of nodes into 
modules underhes their dynamic formation. Nodes may group according to particular 
characteristics (types), reflecting a tendency to establish stronger ties with similar others 
(e.g., according to interests, occupation, or beliefs) |T8]. Under this proposition, the 
modularity of a network captures the difference between the average fraction of edges 
within communities and the expected value for a random network (Q- modularity) [11] . 
A measure of modularity Q > 0.3 suggests the existence of a well-defined community 
structure (often found in social and information networks). 

Though preferential attachment offers an explanation for the existence of networks 
with power law degree distributions, it does not, by itself, explain the formation of 
strong neighborhood clusters. Clustering coefficients tend to vanish with the continuous 
addition of new nodes (based on both local and global preferential attachment 
mechanisms p^). The development of alternative models that can explain strong 
neighborhood clustering as the natural outcome of the process of growth contributes 
towards establishing a framework that supports the analysis of the clustering behavior 
of power law networks. 

Based on the principle of preferential attachment, the authors of [20], [21] introduce 
a baseline probability of establishing additional links by a process of triad formation. 
They generate undirected networks with tunable degree distributions and clustering 
properties. In [21] the authors deduce analytical results based on generic conditions 
underlying local attachment mechanisms. Unlike [20], [21] the work in [22] explains 
power law behavior in networks in which the process of establishing links does not 
necessarily depend on preferential attachment. The attachment of new nodes results 
according to a uniform random distribution followed by the formation of triad junctions. 
Like [22j the formation mechanism in this paper does not instantiate the principle of 
preferential attachment. 

Although the model in [22] generates extended (rather than single) power laws in 
the in-degree distribution of strongly clustered networks [23], [21], it does not describe 
the threshold that marks the transition from an exponential fit to a power law. Here, 
we deduce analytical expressions for [i) the exponential exponent A that characterizes 
the behavior of nodes with a low degree; (ii) the threshold e above which nodes follow 
a power law; and (iii) the relationships between the clustering coefficients and the 
value of e. The expressions for the degree distribution and the clustering coefficients 
(all dependent on e) imply that there exist common factors driving the formation of 
structure during network growth. Unlike the work in [22], the proposed mechanism 
rests on an immediate implementation of the principle of triad formation (i.e., triads 
may be formed after every random attachment, as opposed to forming triads by choosing 
a node from the union of the set of all neighbors after all random attachments). This 
difference in the process of triad formation yields expressions for both global and local 
clustering coefficients which unlike the expressions in [2^ depend on the threshold e. 

The contribution of the proposed mechanism is threefold. First, it explains scaling 
behavior in networks with an extended power law in their in-degree distribution (offering 
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a better fit than a single or double power law distribution to describe social and 
information networks [7], [25]). Second, it accounts for strong neighborhood clustering 
based on a random triad formation process with a positive stationary mean probability. 
Clustering properties remain constant as the size of the network grows to infinity. Third, 
it explores the formation of communities from allowing nodes to establish stronger ties 
with nodes of the same type (group preference). 

The remaining sections are organized as follows. First we introduce a model of the 
connectivity of a network that grows through the continuous addition of new nodes. 
Theorem 1 shows that, above a certain threshold e, the in-degree distribution follows a 
power law distribution with scaling exponent a, and an exponential distribution with 
exponential exponent A, otherwise. We present analytical results for values of a, A, and 
e. The results suggest that the transition from exponential to power law distributions 
depends on both the scaling exponent (which, in turn, depends on the probability of 
forming triads) and the number of links that newly added nodes establish. Theorem 2 
characterizes the evolution of the global and local clustering coefficients and presents 
asymptotic expressions for C and Ci (both depend on e). Second, we characterize the 
relationship between the formation of triads and the scaling exponent of the network. 
Simulations also show the effect of group preference and network modularity on a, A, C, 
and Ci- Third, we apply the proposed mechanism to generate realizations that resemble 
the degree distribution and clustering properties of an empirical network with no directed 
cycles. In particular, we consider the opinions written by the U.S. Supreme Court and 
the cases they cite [26]. We discuss how the model contributes to the understanding of 
the semantic evolving topology, and more generally, how it identifies generic conditions 
that lead to the formation of structure as these types of acyclic directed networks grow. 
Finally, we draw some conclusions and future research directions. 

2. A network formation model 

Let the graph Qt = (Tit, At) represent the network at time index t. The set At = '■ 
i,j G T-Lt} represents the relationships between a finite set of interconnected nodes that 
belong to = {1, . . . , Nt}. The pair (z, j) indicates that there exists a directed edge 
between nodes i and j, and qi{t) = {j G Tit : G At} represents all nodes that link 
to node i (i.e., its incoming neighbors at time t). For any node i G Tit, let ki(t) = \qi{t)\ 
represent the in-degree of node i. 

2.1. Node attachment 

Every time index t a new node attaches to m different nodes, selected according to 
a uniformly random distribution over T-Lt-i- Let n > denote the number of edges 
established from nodes in "Ht-i to the newly added node, according to some mechanism 
that responds to the attachment. If there is no such response underlying the node 
attachment process then = (e.g., for a network with no directed cycles). 
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2. 2. Triad formation 

The requirements for the formation of triad junctions are similar to the conditions in- 
troduced in |20]. When node j ^ Tit-i attaches to some node j' G Tit-i, it may also 
establish an additional link to one of the outgoing neighbors of node j', selected again 
according to a uniformly random distribution. If j G qj'{t) and j' G qi{t) for some node 
i, node j links to node i with probability Xi{t). A multivariate random variable Xt with 
a positive expected probability pt = E[Xt] = f{<Ji, ■ ■ ■ , <Js)dai ■ ■ ■ das captures the set 
of possible different probabilities of establishing a link between nodes j and i, where 
(Ji, ■ ■ ■ , as are independent factors that influence the formation of triads. Note that if 
the set of outgoing neighbors of node j' is a subset of the set of outgoing neighbors of 
node j then there is no possibility of establishing additional links through triad forma- 
tion. The process repeats for every edge established by a newly added node (m times) 
before another node may attach to the network. Let X = {Xt} with stationary mean 
p > be the random process associated to the process of triad formation. 

Assumption 1 (on the initial network): To ensure that the two-step mechanism (growth- 
plus-triad-formation) can be properly completed, we require that (a) the network Qo is 
weakly connected; and (6) the network Qq has at least m nodes, each with at least one 
outgoing neighbor. 

Assumption 1(a) is satisfied if replacing all the directed edges with undirected ones 
produces a connected undirected graph. Assumption 1(6) means that Nq > m and for 
every node i G "Hq there exists a node i' such that i G qi'{0). This last condition is 
required when p = 1. 

3. Analysis 

It is of interest that the mechanism guarantees topological properties of both the in- 
degree distribution and the clustering coefficients of the network. 

Theorem 1 (in-degree distribution): For all Qq that satisfy Assumption 1, the in-degree 
distribution pk of Qt follows an extended power law as t — i- oo. The scaling and 
exponential exponents are a = 2 + ^ and A = with threshold e = {a — l)m. 

Proof. We assume that the in-degree of node i is a continuous variable hi G M, ki > 0. 
Every time index t a newly added node j ^ Ht-i attaches to m different nodes in Tit-i, 
selected according to a uniform distribution process over the A^o + ^ — 1 existing nodes. 
The probability that node j attaches at time t to a node i G Ht-i is 

m 

No + t-1 

The triad formation step that (immediately) follows random attachment adds to the 
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rate of change of node i with in-degree ^^(t — 1) by 

mki{t) 



No + t-1 



m(l + p) 



P 



The first term is the probabihty of selecting, during random attachment, an 

incoming neighbor of node i (i.e., some node j' G qi{t)). The second term „^ is the 



m(l+p) 



probabihty that node j' is an incoming neighbor of node i (i.e., j' G qi{t)). Furthermore, 
the probabihty p is the stationary mean of the random process of forming triads. The 
multiphcation of ah 3 terms define the probabihty of forming a triplet with an edge that 
contributes to the in-degree of node i. Thus, the overall rate of change of ki(t) is 

dki{t) m p kiit) 



+ 



dt No + t-l'l+pNa + t-l 
with boundary condition fcj(ti) = n. The solution to ^ is 



(1) 



ki{t) 



n 



m 



1 + 



p 



p 
m 



No + ti 



V 

1+p 



(2) 



Using ([2]), the analytical expression for the cumulative distribution of the in-degree 
P[ki{t) < k] of node i equals 



P 



n 



p 



1 H — 1 m 

p 



m < k 



n+ 1 + i 



Nn 



t - 1 \ i+P 



No + ti-1 



U > 



m 



fc+ ( 1 + i I m 



(No + t-l) 



- (iVo - 1)] 



And as if: — )■ oo 



n + [1 + - ] m 



k+ (l + -]m 



Finally, 



PMt) <k] = i 



dP[ki{t) <k] / / 1 , 
Pk = ^ ~ ^ =a\k + \ l + -]m 



(3) 



(4) 



where a = + [n + [1 + ^) m) ^ . Note that (j4]) exhibits an extended power law 
of the form 
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where a = 2 + ^ and e = {a — 1) m. When ^ e, (jlj) is reduced to a single power law 
Pk ~ k~°'. On the other hand, when <^ e we have 



Inpfc ~ —a \n{k + e) 



a 



— a 



In ( 1 + ^ 1 + In e 



- + Ine 

s 



and obtain 



Pk ~ e exp ( -a- 



Thus, (jlj) is proportional to the exponential form pk ~ exp (—A A;) with A = 



□ 



Remarks: Theorem 1 implies that, as the network grows, the scaling exponent of the 
in-degree distribution depends on the stationary mean of forming triads. The distribu- 
tion follows a strict power law for nodes with a degree greater than {a — 1) m and an 
exponential fit otherwise. The left frame of figure [H shows the value of the scaling expo- 
nent a for different values of p. Note that the mechanism generates network realizations 
with scaling exponent a > 3. 



Theorem 2 (clustering coefficients): For all Qo that satisfy Assumption 1, the global 
clustering coefficient of Qt tends to C = ^^-^^^^a as t — )■ oo. The asymptotic behavior of 
the local clustering coefficient for a node with in-degree ki = k follows 



2{k + pm + {2 + p- a) ln(^)) 



{k + pe) {k + pe — 1) 

Proof. Note that the only edge configuration to form transitive triplets is when node 
j ^ Tit attaches to j' G Tit such that j G qj'it) and there exists a node i ETit such that 
j' G qi{t). A triad is formed if node j establishes a third edge to node i that connects 
nodes j, j', and i. The probability of establishing the third edge that closes the triplet 
is pm. Moreover, when node j attached to the network, it connected (on average) to 
m(l + p) outgoing neighbors (because node j established m edges according to the 
attachment process and then established an expected pm additional edges according to 
the process of triad formation). Each outgoing neighbor of node j also has (on average) 
m(l -\-p) outgoing neighbors. Thus, there are m^(l +p)^ different possible pairs to form 
triplets. The global clustering coefficient is given by 
^ pm ^ p 

m2(l+p)2 m(l+p)2 ^ ^ 

which can also be expressed in terms of e as C = 

Next, to capture the local clustering coefficient of a node i, note that the number of 
possible pairs of incoming and outgoing edges of node i (with in-degree ki = k) is given 
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by 

+ m(l + p) 



2 

{k + m{l + p)){k + m{l + p) - 1)) 



(6) 



2 

Equation (|6]) captures the total number of possible triplets that involve node i. Now, 
to capture the number of actual triplets that involve node i, we consider three possible 
scenarios about the edges that may lead to triad formation: Node i has (i) two outgoing 
edges; (ii) an outgoing edge and an incoming edge that was established through random 
attachment; and {Hi) two incoming edges with at least one of them having been 
established through triad formation. 
In scenario (i), there are an expected 

pm (7) 

connected triplets. 

In scenario (ii), the number of incoming edges created through random attachment is 

dk*{t) ^ m 

dt No + t-1 ^ ^ 

with initial condition k*{ti) = (note that at t = tj the newly added node i cannot have 
incoming edges that were established through random attachment). The solution to ([8]) 
is 

Moreover, using ([2]) we also know that for node i with in-degree ki{t) = k 



k 




No + t, - 1 
Replacing (fTOl) in ([9]) we know 

) mln I 

n+[l 

Note that the probability of establishing the third edge that closes the triplet is 



(10) 



1 



p 



m 



1 + - mln ) 4 — \p (ir 



m 
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For scenario {Hi), the number of incoming edges that were estabhshed through triad 
formation is given by 



k- 1 + 



P 



m In 



m 



n + 




- - ) m 









(12) 



which is the probabihty of estabhshing the third edge that closes the triplet. Finally, 
dividing the sum of ([7]), f|TT]) . and f lT2|) by ([6]), we know 



C^{k) 



(A; + (1 + p)m) {k + {l+ p)m - 1) 
2(k+pm+{2+p-a)\n f 



(A; + pe) {k +pe — 1) 
where a = 2 + - and e = (a — l)m. 



(13) 
□ 



Remarks: Theorem 2 implies that the values of C and Cj do neither depend on the 
initial network Qq nor the size of Qt (i-e., the clustering coefficients do not vanish as the 
network grows) . The right frame of figure [U shows the value of the global clustering 
coefficient C for different values of p and m. Note that the model captures an inverse 
relationship between the clustering behavior and the amount of edges established during 
every attachment. 




Figure 1 . Scaling exponent a for different values of p (left frame) ; and global clustering 
coefficient C for different values of p and m (right frame). 

The left plot of figure [2] shows the effect of a on the local clustering coefficient. For 
nodes with a low degree, high values of a tend to form strong neighborhood clusters 
(below pe). For nodes with a high degree, the effect is opposite and the local clustering 
coefficient is proportional to (a behavior observed in empirical data [T7]). Like for 
the global clustering coefficient, the right plot of figure |2] shows an inverse relationship 
between the average clustering coefficient Cav = PkCi{k)dk and the value of m. 
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Finally, note also that the average clustering coefficient is slightly greater than the 
global clustering coefficient (also observed in empirical measures of clustering [TO]). 

Ci(k) Cv 




J , , , , ^ k . p 

2 5 10 20 50 100 0.4 0.6 0.8 1.0 



Figure 2. Local clustering coefficient Ci{k) for different values of a with m — 1 and 
n = (left); and average clustering coefficient Cav for different values of p and m with 
n = (right). 



4. Simulations 

To gain further insight into the network formation process, let A^o = 12 and n = 0. 
Following similar ideas as in |22], |27], let the probability of establishing additional links 
due to triad formation be Xi{t) = 1 — where u captures the compatibility between 
nodes and is chosen from a uniformly random distribution with support on [0, 1] (i.e., 
the random variable Xt takes values Xi{t)). Let the parameter c, < c < u, represent 
the cost of establishing additional links (here c = 0.1m). The expected value of Xt at 
time t is given by 

Pt = E[Xt] = j ^ - P^Pk (14) 

where Pu = ^ and pk is the probability distribution of ki{t) according to Theorem 1. 
According to (fT4l) it can be shown that because — j- 1 as t — )■ oo, the process of triad 
formation has stationary mean p = 1 for n > and m > 0. 

Figure [3] shows the in-degree distribution for different values of m at t = 10^. For 
nodes with a low degree, the complementary cumulative degree distribution degenerates 
into the exponential form. In particular, the threshold e = 2m characterizes the 
transition from an exponential (with A = |) to a power law distribution (with a = 3). 

Figure m shows the value of the local clustering coefficient Ci{k) as a function of 
k for different values of m. Note that the asymptotic expression of Ci{k) tends to 
k~^ for values greater than approximately 2m. The fact that the clustering coefficient 
that characterizes the different nodes follows a scaling law, reveals the hierarchical 
organization of the generated networks [IT], [28]. Note that the scaling of Ci{k) emerges 
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1 - P[k|(t) < k] 




Figure 3. Complementary cumulative distribution function of the in-degree 
distribution pk on a logarithmic scale. The solid bottom curve represents the 
theoretical prediction according to ([3l) for m = 1; the dots represent simulation results. 
The two solid curves at the top represent predictions for values of m = 7 and m = 3. 




Figure 4. Local clustering coefficient Ci{k) as a function of the in-degree of a node. 
The solid top curve represents the theoretical prediction according to ()13|) for m = 3; 
the dots represent simulation results. The solid bottom curve represents the prediction 
for m — 5. 



5. Group preference 

To explore the formation of communities, consider the following modifications to the 
node attachment and triad formation processes. Let the characterization of two types 
of nodes, denoted by 5 G {1,2}, infiuence the formation of Qf The variable 6i specifies 
the type of node i. 
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5.1. Node attachment 

Every time index t a new node attaches to m different nodes. The type 6j of the new 
node j ^ Tit-i takes value 1 with probabihty |. When node j attaches to the network, 
it connects to a node j' G "Ht-i of the same type {6j = 6j>) with probabihty Pr (and 
with probabihty 1 — to a node of different type). 



5. 2. Triad formation 

The probabihty Xi{t) that node j estabhshes an additional link to an outgoing neighbor 
of node j' (i.e., to some node i such that j' G qi(t)) is also influenced by their type {6j 
and 6i). As before, Xi(t) evolves according to a multivariate random variable with a 
finite expected probability pf and {Xf} represents the random process associated with 
triad formation. Here, let 



Xi{t) 



Pa 



if = 

l~PA)-±-,ifS,^6i 



where < pa < 1- If nodes j and i are of the same type, the process of triad 
formation has stationary mean pj\. Otherwise, it has stationary mean 1 — Pa- The 
left plot of figure |5] shows the modularity Q for different values of pr for a network 
with Nq = 12, n = 0, m = 1, and pa = 0.5 (each point represents 100 simulation runs 
with t = 10^; error bars represent one standard deviation) [TT]. The formation of non- 
overlapping communities is evident as pr increases. The right plot of figure [5] illustrates 
the variation of the scaling exponent as pa increases (with pr = 1). Note that when 
Pa > 0.5 the scaling exponent decreases, which indicates that group preference starts 
to affect the power law behavior of the resulting network. 





0.2 0.4 0.6 0.8 1 Pr 



0.17 0.33 0..50 0.67 0.83 1 Pa 



Figure 5. Modularity Q for different values of pr (left); and the resulting scaling 
exponent a for different values of pa (right). 



We characterize the relationship between modularity Q and the average clustering 
coefficient Cav in figure O As indicated in Table [H some parameter regimes produce 
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linear relationships between Q and Cav (with different slopes for different values of Pr 
and Pa)- For the relationships with a positive slope (i.e., li — l^) the model produces 
outcomes that resemble the empirical measures in [29j. 



0.50 




Figure 6. Relationships between the modularity Q and the average clustering 
coefBcients Cav for different values of Pr and pA • Table [T] shows the parameter regime 
for the correlations between Q and Cav 



Table 1. Parameter regime for the correlations in figure |6] when n ~ 0. 





Slope 


Range of Pr 


Pa 


h 


6.268 


[0.90,1] 


0.3 


h 


7.060 


[0.90,1] 


0.4 


h 


8.632 


[0.90,1] 


0.5 


k 


11.707 


[0.90,1] 


0.6 


h 


20.773 


[0.90,1] 


0.7 


k 


94.292 


[0.80,1] 


0.8 


h 


-23.726 


[0.80,1] 


0.9 


h 


-9.814 


[0.70,1] 


1.0 



6. The U.S. Supreme Court citation network 

We apply the proposed mechanism to generate network realizations that resemble both 
the degree distribution and clustering properties of the U.S. Supreme Court citation 
network (using data from 1754 to 2002) The citation network is created by a 

dynamic process, in which the number of opinions grows over time as judges write 
opinions that cite cases. The structure of the network captures which opinions get 
cited by later opinions. The evolution of the empirical network has the characteristic 
in-degree distribution shown in figure [71 It illustrates how the Supreme Court opinion 
citations are concentrated in a relatively small core of cases. 

Figure |8] shows the relationship between the complementary cumulative in-degree 
distribution of the empirical network and its theoretical counterpart (according to ([3])). 
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■ P[ki(t) < k] 
It 



0.1 



0.01 



0.001 



10-^ 



5 10 20 50 100 200 



Figure 7. Complementary cumulative probability for the in-degree distribution of the 
U.S. Supreme Court citation network; data from |26| . 



We use model parameters N = 12, n = 0, m = 6, and p = 0.43 which yields a = 4.32 
and e = 19.92. The empirical distribution 1 — P[ki(t) < k] correlates to the theoretical 
prediction 1 — P[ki{t) < k] with a Pearson's correlation coefficient of 0.99. 



1 - P[k,(t) < k] 
Ik 



10 



1 - P[ki[t) < k] 



Figure 8. Relationship between the empirical and the theoretical prediction of the 
degree distribution of the U.S. Supreme Court citation network. 

Because the generated network does not rely on the dynamics of preferential 
attachment, it suggests that the formation of structure may be driven by the tendency 
to establish additional citations and not a "rich get richer" dynamic [30]. Cases may 
accumulate legal authority (measured as the number of citations) not because - having 
been cited approvingly be judges - they are more likely to be cited in the future. This 
implies that even if new citations are viewed as "random attachments," the scaling 
behavior will emerge, as long as each citation also refers to the opinions within the cases 
it cites. 

Next, figure |9] illustrates the relationship between the empirical and the theoretical 
prediction, of the local clustering coefficient (according to (fT3l) ). The empirical 
measure Ci{k) correlates to the theoretical prediction Ci{k) with a Pearson's correlation 
coefficient of 0.88. When an opinion cites another one and the two opinions cite a 
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third opinion (i.e., forms a triad that contributes to strong neighborhood clusters), it 
is a signal that these opinions (cases) are especially relevant to one another. These 
properties present an important source for discovering legal clusters that are tightly 
linked in terms of meaning and subject matter. 



1.00 h 



- Ci(k) 



0.010 0.015 0.020 0.030 0.050 0.070 0.100 



Figure 9. Relationship between the empirical and the theoretical prediction of the 
local clustering coefficient of the U.S. Supreme Court citation network. 



7. Conclusions 

This paper introduces a mathematical framework that generates extended power law 
distributions with constant clustering coefficient based on a two-step mechanism: (i) 
during attachment, a newly added node links to a finite number of randomly selected 
nodes; and {ii) during triad formation, the new node may establish an additional link 
to one of the neighbors of the node it attaches to. The proposed mechanism is of 
interest because it helps explain the existence of extended power law networks with 
clustering properties that do not vanish as the size of the network grows. Generating 
network realizations with a desired scaling and clustering behavior allow us to evaluate 
which principles can lie behind the formation of relationships in large amounts of data. 
Moreover, our framework captures the effect of group preference in the formation of non- 
overlapping community structures. Analytical results about the processes that leads to 
overlapping community structures on clustered networks provides an important direction 
for future research. 
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